rank | frequency | n-gram |
---|---|---|
1 | 4343 | -는 |
2 | 3659 | -을 |
3 | 3547 | -의 |
4 | 2863 | -에 |
5 | 2824 | -다 |
rank | frequency | n-gram |
---|---|---|
1 | 1353 | -에서 |
2 | 1096 | -으로 |
3 | 530 | -하고 |
4 | 529 | -하는 |
5 | 473 | -했다 |
rank | frequency | n-gram |
---|---|---|
1 | 190 | -다”고 |
2 | 186 | -에서는 |
3 | 155 | -적으로 |
4 | 150 | -이라고 |
5 | 142 | -다"고 |
rank | frequency | n-gram |
---|---|---|
1 | 40 | -다"면서 |
2 | 36 | -”이라고 |
3 | 32 | -으로부터 |
4 | 31 | -하겠다는 |
5 | 31 | -겠다”고 |
rank | frequency | n-gram |
---|---|---|
1 | 16 | -(한국시간 |
2 | 14 | -하겠다”고 |
3 | 14 | -하겠다"고 |
4 | 11 | -a.com |
5 | 9 | -체육관에서 |
The tables show the most frequent letter-N-grams at the ending of words for N=1…5. Everything runs in parallel to 2.2.5 Most frequent word beginnings. The aim is suffix detection instead of affix detection.
For N=3:
SELECT @pos:=(@pos+1), xx.* from (SELECT @pos:=0) r, (select count(*) as cnt ,concat("-", right(word,3)) FROM words WHERE w_id>100 group by right(word,3) order by cnt desc) xx limit 5;
2.2.5 Most frequent word beginnings